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Local Educational Agencies (LEAs) typically use one of the following 
procedures to select children to participate in compensatory education 
programs: 

1. They ask teachers to nominate children who need compensatory 
education services. 

2. They use standardized tests to identify children performing below 
average compared to the national norms. 

3. They use locally developed oelectlon tests to Identify children 
lacking basic skills. 

While each of these approaches has strengths and ifeaknesses. many districts 
would prefer to use locally developed tests. These tests can be short, easy 
to administer and tied directly to the Instructional needs defined by the - 
district. However, locally developed needs assessment instruments have 
unknown psychometric properties, and districts are reluctant to base 
important selection decisions on these tests. 

This paper describes four studies of the reliability and validity of needs 
assessment Instruments developed by tjfe Taylor (Michigan) Public Schools. 
Taylor is a predominantly working class suburb of Detroit and Is the tenth 
largest school district In the state with approximately 13,500 students. 
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Needs Assessment Instruments: 

The instruments consist of separate tests for kindergarten, first and 
second grade. (There is also a third grade test that was not included in 
this study.) Copies of the tests appear In Appendix A. 

Each test consists of thirteen or fourteen Items that measure If the child 
possesses the cognitive and psychomotor skills that teachers expect of 
students when a child enters a particular grade level. The tests are 
individually administered to all children at the beginning of each school 
year by experienced teachers. 

The student's performance on each Item Is scored as either a "1" (100% 
accuracy), a "2" ("Some difficulty"), or a "3" (Poor performance). The 
number of 2's and 3's Is used to determine If the child should participate 
in the compensatory education program. If the child has more than five or 
six scores of "2" or "3" (the threshold depends on the grade level), the 
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child is considered for compensatory education services. 

.1 

Brief Description of the Four Studies: 

This paper summarizes the methodology and findings of four studies of the 
validity and reliability of these needs assessment instruments. 

Study 1: Stability of Scores This study used a classical test-retest 
design to provide information about the stability^ of scores 
obtained on these tests. 

Study 2: Classification Stability This study used the test-retest 
data to examine whether students m^^intained their 
classification of "needs compensatory education" or "does not 
need compensatory education" over a two week period. 

Study 3: Content Validity This study examined the degree to which 
these tests measure concepts considered important by 
classroom teachers. 0 

Study 4: ConcurrenfeftValidity This study examined the relationship 

between the scores of children on the needs assessment tests 
and teacher judgment about each child's need for compensatory 
education services. 

The metric for analysis in these studies consists of the number of items on 
which the students did not perform satisfactorily. For example, a "score" 
of seven means the student failed seven items. Thus, the higher the 
students* scores, the pcorer their performance. 



Confounding Variable: 

The authors are aware of at least one important uncontrolled variable that 
might distort the results of this study. 

Michigan regulations associated with its state-wide compensatory education 
program limit the number of children who can receive special services. The 
Taylor Schools, in an effort to operate the most effective program, adopted 
a student selection policy based on studies of the effect of early 
intervention on student performance. That is, the district believes it is 
important to identify children who are at academic ^sk while they are in 
the lower elementary grades; it concentrates its cou^Jensatory education 
effort on young children. Consequently, the district set cut-off scores on 
these tests at a level that would identify all students likely to have 
academic difficulty. This resulted in establishing low cut-off scores and 
in selecting some students who do not need c(^ensatory education services. 
Teachers later had the opportunity to recommend that those children be 
dropped from the compensatory education program. 

As one would expect, this early intervention policy caused the district to 
set cut-off scores that i (tent if led a disproportionate number of students as 
needing cofli^ensatory assistance. For example, 40% of the 965 students 
enrolled in kindergarten were selected for the compensatory education 
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program. In first and second grades those percentages are 53% and 49% 
respectively. 

The selection of this many children for the compensatory education program 
had two major effects on the present study. First, it forced the district 
to select children with relatively good scores to participate in th. 
program. As one expects on mastery tests, there were many students who 
performed well on these measures, but the district set Its cut-off scores 
within the midst of this large group. That had a detrimental impact on the 
reliability of the classifications of students into eligible and not 
eMgib^e groups. Second, this decision lead to the selection of students 
that teachers might not ordinarily recommend for' compensatory education 
services. This was particularly obvious in Study 4 in which teachers 
nominated far fewer children for compensatory education services than were 
admitted to the program based on their needs assessment scores. As 
mentioned earlier, the teachers later dropped students who should not 
participate in the compensatory education program. 



Study 1^ Stability of Scores 

Test score reliability is a prerequisite for meaningful scores. This study 
used a classical test-retest design to examine the stability of the scores 
obtained by students on the three needs assessment measures. 



Methodology: 

All students in regular kindergarten, first and ^second grade classes took 
the needs assessment test in September, 1984 Twenty •^'^cent of the 
students who completed the pre-test were randomly selected from the total 
sample using a stratification procedure to Insure a proportionate number of 
the students were selected from, each school. Those students were re-tested 
between two and three weeks later by the same examiner. Pearson product- 
moment correlations were computed to estimate the test-retest reliability of 
the needs assessment scores. 



Findings: 

Table 1 presents the product-moment correlations of student test scores with 
scores obtained on the re-test. Correlations between .70 and .75 suggest 
that scores obtained on the tests are only modestly stable over a two to 
three week interval and that the district must be particularly cautious 
about using the results of this test to make decisions about ^th^-performance 
of Individual students. 



TOTE 1: lESr FE-OESr R7.TBBTT.TIY 

K 223 .73 1.7 

1 228 .70 1.4 

2 233 .75 4 1.4 



There are at least two factors that contribute to these findings: 

1. The tests are short, having between 13 and 14 items. Since it is 
unusual for a short test to generate reliable scores, the finding of modesl 
score reliability should be expected for these needs assessment instruments. 

2. Mastery tests of this type do not lend themselves to traditional 
correlational analysis (Gronlund, 1985). Examination of the score 
distributions indicate that most students performed well on the test^ (as 
indicated by low irean scores on the two administrations of the te'st)-. In 
essence, there are two sets of scores. Most students performed well on the 
tests and their scores are clustered narrowly near the perfect score of 
zero. A smaller group did hot perform well on the tests and their scores 
are distributed over a wide range. When these two sets of scores are 
combined, the result is a highly skewed distribution of scores that masks 
the restricted range of scores of the large group of successful students. 
This restriction In the range of scores of one group of students predictably 
reduces the correlation between the pre-tes* and post-test scores for the 
total group. 

In suninary^-thi s test-retest reliability analysis suggests the tests yield 
results of low reliability. However, the procedure^ used in this 
traditional norm referenced approach to test reliability are possibly not 
appropriate for examining mastery tests. 



Study 2; Classification Stability 

The needs assessment tests developed by the district are selection measures; 
they classify students Into two groups: students needing compensatory 
education assistance and those not needing assistance. Thus, these tests 
can be treated as nastery measures that sjEudents either pass or fail. 

Gronlund (1985) suggests that psychomelriclans examine the stability * 
mastery test by determining if the test is consistent in its ability to 
classify a student as passing or failing. For lack of a better term, we 
call this characteristic "classification stability"; the degree to which a 
test consistently classifies a student Into the "needs help" and "does not 
need help" categories. If we assume that students who need compensatory 
education services at the end of September will also need those services two 
to three weeks later, reliable tests should consistently classify students 
as either needing help or not needing help over that time perlo.i. 

Study 2 examined the classification stability of the needs assessment 
instruments. 



The reader Is again reminded that low scores on the tests suggest high 
levels of student performance. 
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Methodology: , 

Data for this study tconsists of a re-analysis of the data from Study 1. and 
subjects in this study are the same randomly selected 20X of the students 
attenoinq kindergarten, first and. second grade in the district. Each 
student was classified as "eligible" for compensatory education services and 
"not eligible- on both the test and the re-test administered two to three 
weeks later. Those data were the basis for the analysis. 
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Findings: 

Tables 2-4 present the results of these analyses, and Table 5 suimiarizes the 
results of the studies at the three different grade levels. The «jetric 
labeled -Consistency* in Table 5 is the percentage of children classified 
into ':he same category ("eligible- or "not eligible") on both 
administrations of the test. 



TABLE 2: CLASSIFICATION STABILITY - KINDERGARTEN 



ORIGINAL TEST 



wo 





Eligible 


Not Eligible 


Total 


Eligible 


60 


n 


71 




68% 


8% 








15« 




Not Elfgible 


28 


124 


152 


# 


32X 


92% 






18% 


82% * 




Total 


88 


135 


223 



Cell Contents: 



N 

Column % 
Row % 



Chi-square = 85,7 
df = 1 
P<01 
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t TABLE 3: CLASSIFICATION STABILITY - GRADE 1 

ORIGINAL TEST 



Engif)le 



Not Eligible 



Total 



El iglble 


Not Eligible 


Total 


59 


12 


■ 

71 


49X 


11% 




831 


m 




62 


95 


157 


sn 


89t 




391 


612 




121 


107 


228 



Cell Contents: 



N 

m 

Column % 
Row X 



Chi Square » 35.6 
df » 1 

P <.oi 
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TABLE 4: CLASSIFICATION STABILITY - GRADE 2 



ORIGINAL TEST 



Eligible 



Not Eligible 



Total 



Cell Contents: 



Eligible 


Not Eligible 


Total 


67 


14 


81 


57% 






83% 


m 




51 


101 


152 




88% 




34X 


66% 




/ 

'i 


115 


233 



N 

Column % 

ROM % 



Chi Square -49.1 
df » 1 
p<.01 
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TABLE 5: dASSIFICATICN STABILITY 



% OF STODENTp... 



Eligible pn Test Who Not Eligible On Test V«io 
Were Eligible On Were Not Eligi^>le On 
Re-Test Re-Test Consistency^ 

K S8% 92% 83% 

1 49% 89% 68% 

2 57% 88% 72% 



^Consistency » fNurober of Students Whose Eligibility Didn't Changed - 

\ Total Number of Students j» 
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The results suggest that the tests have a modest- abi lity to classify 
students into these categories with any degree of stability. Over all, 
approximately 25% of the students were classified into a^different category 
when they took the same test two to three weeks later. 

These results mask differences that exist between the students classified as 
"eligible" and chose classified as "not eligible" based on the first 
administration of the test. As suggested by the data in Table 5, students 
classified as "not eligible" on the first administration of the J:est were 
far more likely to retain their "not ^1 igible" classifications than were 
students classified as "eligible" on the first test. 

Further examination of the data indicate that the re-test scores were 
substantially lower (i.e., better) than the scores obtained on the first 
administratiom of the measure. In kindergarten; 32* of the children who 
qualified ^or the program on the first achninfstratlon of the test scored too 
low on the test to qualify for the program' two to. three weeks later. ^ 
Perhaps this should be expecteVl given the age of the chllcfcren and the rate 
at which they learn Introductory concepts. However, 51* of the first grade 
students who Initially qualified for the compensatory education program did 
not qualify when given the same test two weeks later. In second grade, 43% 
of the students did not qualify for the program when re-tested. 

Conversely, the results were stable for children who obtained good scores on 
the first administration of the test, ftost of the children deemed not 
eligible for compensatory education on the first administration of the test 
were also not eligible on the second adiini strati on of the measure. 
Overall, only 10% of the children whose scores made them ineligible for the 
program on the original administration of the test were eligible on the re- 
test. 

In general, one must conclude that while the tests provide reliable results 
for children who pass the test, the results are iwt sufficiently reliable 
for students who obtain poor scores on the measure. Since the purpose of 
the test is to Identify children who -quality for compensatory education 
services, and since 43% of those identified as "qualified" on the first 
administration of the'test were "not qualified" two weeks later, we must 
conclude that the tests do not provide a consistent indicator of who should 
receive coii^ensatory education services. 

The authors have reservattons about generalizing these findings to other 
compensatory education programs. As noted earlier, the Taylor Schools 
consciously over- Identified children to participate In the coo^ensatory 
education program at these lower grade levels. This led them to set cut-off 
scores that were close to the middle of the narrow cluster of scores 
obtained by students who performed well on the test. Thus the district 
selected and rejected many students who were close to the cut-off score; 
those students could change their position Into "eligible" and "not . 
eligible" groups based on test score changes of one or two points. It is 
likely that the xli strict 's decision had a detrimental Impact on the tests' 
ability to reliably classify students into the two categories. s 



^Readers might also suspect that part of the shift in test sco'-f^ /an be 
explained by statistical regression toward the mean. »?:e^«;r ? 
in number of students qualifying for the program is so <*ramatic that it is 
unlikely to be explainable by this phenomenon. 
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The authors are now conducting a st^idy to examine the impact of using 
different cut-off scores on the classification stability of the needs 
assessment test results. 



Study 3i Content VaMdity ^ 

The needs assessment tests were designed to measure whether students 
mastered the basic skills and concepts that are expected of children before 
they enter 'kindergarten, first and second grades. Study 3 examined (a) if 
teachers' believe the skills and concepts measured on the needs assessment 
instruments are important for the educational development of children, and 
(b) the degree to which the locally developed needs assessment tests 
measured those concepts and skills. * 



Methodology: 

Data for this study consists of teacher response to two questionnaires. One 
questionnaire measured teachers' perceptions of the importance of the skills 
♦^osted on the needs assessment instruments; the second examined if teachers 
^iiought the^tests measured those skills. 

Items in the questionnaires were based on the skills being measured at each 
nrade level. Since the intent of a test item Is not always obvious from the 
Item itself, the authors of the needs assessment instruments described the 
skill they were trying to measure In eacn Item. Two questionnaires were , 
developed for each grade.level based on these lists- of skills. One survey 
asked teachers to rate the extent to^ which each skill Is important for 
students entering their grade level. The second survey presented a sample^ 
of each testHtem and a list of the related skills; the respondents 
indicated the degree to which each test item measured the related skill. 
Copies of these Instruments ^pear In Appendix B. 

All 83 kindergarten, first and second grade teachers in the district 
. :ci*ived a copy of the instruments. Seventy-one teachers returned their 
questionnaires. Four of the questionnaires were discarded (The authors 
considered any questionnaire where all the ratings were identical as 
invalid. They also eliminated one questionnaire that had only four 
responses.), so 67 questionnaires were used in the final analysis, for an 
overall return ratio of Bit. Table 6 presents the number of questionnaires 
distributed and the return rate for each grade le^'el. 

lAfcLE 6: RESPONSE RATE FOR TEACHER QUESTIONNAIRES 



NUMBER 

NUMBER NUMBER CONSIDERED % OF VALID 

GRADE DISTRIBUTED RETURNED VALID QUESTIONNAIRES 



19 15 "15 79% 

31 29 26 84% 

33 27 26 79% 

7 11 



Findings: » . 

fables 7 through 9 present the^-esuUs of the two components of this study; 
Tables 10 and 11 summarize those results. 

TABLE 7: CONTENT VALIDITY OF NEEDS ASSESSMENT MEASURES - KDG. 

MEDIAN RATING 



#2 
«3 
«4 

#5 

% 

«6 



TEST ITEM 

- Knows body parts 

- Fine motor skills 

- Recognizes letters 

- Prints first name 

- Knows name 



IMPORTANCE 
OF SKILL^. 



- draw picture 



#7 - 



Recognizes letters 
Knows abc ' s 



#8 " Knows address 

#9 - Answers with sentence 
#10 - Repeats 4 words 
#11 - Counts 1 to 10 
#12 - Counts 4 objects ' 
#13 -^1 to 1 correspondence 
#14 - Color names 
#15 - Knows body parts 

4 

#16 - Hops 

#17 - Balances on 1 foot 
#18 - Eye-hand coordination 
#19 - Copies shapes 



MEASURE 
OF SKILL ■ 

3 

4 

2 

5 

5 

4 

5 



4 

5 
5 
5 
4 
5 
5 
4 
4 
5 
5 



Rating of Importance: 

1 = Very unimportant 

2 - Somewhat unimr 

3 = Neutral - nei " 

or important 

4 s Som^hat important 
* 5 « Very important 



mimportant 



Rating of Content Validity 

1 = Definitely does not 

measure the concept 

2 = Poor measure of the concep 

3 » Neutral 



a 



12^ 



4 s Good measure of the concep 

5 •= Excellent measure of the 

concept 



TABLE 8; CONTENT VALIDITY OF NEEDS ASSESSMENT MEASURES - GRADE ONE 

MEDIAN RATING 
IMPORTANCE MEASURE 





TEST ITEM 


OF 


SKILL-^ 


OF SKILL-' 


#1 




Prints name 




5 


5 


#2 




Recognizes letters 




5 


2 


«3 




Knows body parts 




4 


3.5 


#4 




Fine motor skills - draws 


picture 


4 


4 


#5 




Knows complete address 




4 


5 


»6 




Recites the arlphabet 




5 


5 


#7 




Knows upper case c ters 




5 


5 


»8 




Knows .lower case letters 




5 


5 


#9 




Remembers number sequence 


- 


4 


5 


#10 




Finds 3 matching letter^ 




5 


5 


#11 


mm 


Counts to 20 


• 


5 


5 


#12 




Recognizes numbers 1-10 




5 


5 


#13 




Selects 6 objects from 10 




5 


5 


#14 




1 to 1 correspondence 




5 


2 


#15 




Knows geometric shapes 




4 


5 


#16 




Knows missing numbers to 


10 


5 


5 


#17 




Skips 




4 


5 



Rating of Importance: 

1 = Very unimportant 

2 = ScTiewhat unimportant 

3 - Neutral - neither unimportant 

or important 

4 = Somewhat important 

5 - Very important 



Rating 
1 

2 
3 

4 

5 



of Content Validity: 
= Definitely does not 
measure the concept 
= Poor measure of the 
= Neutral 



cone 



Good measure of the cone 
Excellent measure of the 
concept 




8a 



13 



. -^r^ TABLE 9: CONTENT VALIDITY OF NEEDS ASSESSMENT MEASURES - GRADE TWO 

MEDIAN RATING 



IMPORTANCE MEASURE 
TEST ITEM OF SKILL^ OF SKILL2 



#1 




Writes sentences 


4 


4 


#2 




Prints letters 


5 


5 


#3 




Knows body parts 


5 


4 


#4 




Fine motor skills - draws picture 


4 


4 


#5 




Knows missing numbers to 99 


4 


4 


»6 




Adds numbers w/o regrouping 


5 


5 


#7 




Subtracts w/o regrouping 


5 


5 


#8 




Knows address, phone # birthday 


5 


5 






Keacis color woras 


C 

3 


C 
D 


#10 




Says sounds 


5 


5 


#11 




Says short vowel sounds 


4.5 


5 


#12 




Reads simple sentences 


5 


4 


#13 




Reads analog time - hour 


4 


5 


#14 




Reads numerals to 99 


5 


5 


#15 




•Solves addition word problems 


4 - 


5 



Rating of Importance: 

1 = Very unimportant 

2 = Somewhat unimportant 

3 - Neutral - neither unimportant 

or important 

4 = Soinewhat important 

5 = Very important 



Rating of Content Validity: 

1 » Defiritely does not 

measure the concept 

2 = Poor measure of the 

concept 

3 = Neutral 

4 « Good measure of the conci 

5 - Excellent measure of the 

concept 
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TABLE 10; SUMMARY OF MEDIAN RATINGS ON A VALIDITY SURVEY 
IMPORTANCE OF SKILLS 



Grade Grade 
Kdg. One Two 



1. 


Very Unimportant 


oi 


0 


0 


2. 


Somewhat Important 


0 


0 


0 


3. 


Neutral 


0 


0 


0 


4. 


Somewhat Important 


15 


6 


6 


5. 


Very Important 


4 


11 


9 



1 Interpretation: There were no skills measured on the 

Kindergarten test that received a 
median rating of 1 (very unimportant) on the 
teacher survey instrument. 
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TABLE 11: SUMMARY OF MEDIAN RATINGS ON VALIDITY SURVEY - 
ADEQUACY OF MEASURE 



Grade Grade 
Kdg. One Two 

1. Definitely doesn't 0^ 0 0 
measure concept 

2 . Poor measure of 12 0 
concept 

3. Neutral 110 

4. Good measure of 6 15 
concept 

5. Excellent measure of 11 13 10 
the concept 



1 interpretation: There were no items on the Kindergarten 

test that had a med(ban rating of "1" 
from teachers who were asked to rate 
the test's ability to roeasuro the 
concept . 
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The data in Table 10 indicate that all 51 skills were rated "somewhat 
important" or "very important" by the teachers. Two thirds of the skills 
measured on the first and second grade tests were rated "very importarrt" for 
incoming students at those grade levels. (The authors speculate that the 
differences between the kindergarten results and the findings for grades one 
and two reflects the lack of consensus that exists in the profession about 
the skills that should be brought to school by entering kindergarten . 
students.) Overall, these data suggest that the tests measure skills 
considered Important by teachers. 

The data in Table' 11 indicate the'teachers believe the items were good to 
excellent measures of the skills they considered important for entering 
students. Sixty-seven percent of the items were rated as "excellent 
measures of the concept" and an additional 24% were rated as "good measures 
of the concept" by classrocm teachers. Only 6% of the items were considered 
"poor measures of the conc*pt". 

The data summarized In Tables 7 through II suggest that the classroom 
teachers believe the tests do a good job of measuring concepts they consider 
important. 

Study 4^ Concurrent Validity 

This study examined the relationship between each child's needs assessment 
score and teacher judgment about his/her need for compensatory education. 
It is based on the assumption that teachers can Identify children with 
significant academic needs. If we accept that assumption, and if locally 
developed tests measure those needs, there should be a meaningful 
correlation between student's scores and the teacher's rating of their need 
for compensatory services. 

M ethodology: • 

Every kindergarten, first and second. grade teacher in the district was asked 
to select those children In the class needing compensatory education 
se«*vices. These data and each child's test performance were used as the 
bases for two analyses: 

1. A point bi-serlal correlational analysis to examine the relationship 
between a dichotonwus variable (the teacher's judgment of whether the child 
should be In the compensatory education progr»n) and a continuous variable 
(the child's test score). 

2. A Chi -square analysis to determine If teacher judgments about who 
should receive compensatory education services correspond with the results 
of the needs assessment tests. 
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F i nd 1 nqs— Corr e 1 at i on a 1 Analysis ^ 

Table 12 sun^arizes the correlations between teachers, no^ 

students for conmefatoryeducat,o nd student t„^^^ 

assessment measures. While '"e p-vaiues sugg«i relationship between 

sfudent scores on the locally developed needs assessment test . 



TABLE 12 POINT BI-SERIAL CORRELATIONS BETWEEN TEACHER NOMINATIONS 
AND STUDENT SCORES 



.■3i 



Grade N ^ df 



K 858 .42 856 <.01 

1 884 .42 882 ^.01 

2 894 .50 892 <.01 



Findings— Chi-square Analysis 

Tables 13-15 present cross tabulations con^iarlng teacher judgment of student 
needs for compensatory education and th^ selection of students Into the 
program based on the needs assessment Instruments. Each child was 
classified Into one of four cells. For exMiple, the upper left-hand cell 
indicates the number of students who were eligible for compensatory 
education based on the needs assessment test results and were also 
recommended for compensatory education by their teacher. Each cell 
contains: (a) the number of students in that cell, (b) the number of 
students "expected" in that cell (using the standard chi -square technique to 
generate expected cell frequencies) and, (c) the proportion of students from 
that column in the cell. The last figure reflects the degree of agreement 
between the test results and teacher judgment. 
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TABLE 13: COMPARISON BETWEEN TEST RESULTS AND TEACHER 

NOMINATIONS FOR COMPENSATORY EDUCATION - KINDERGARTEN 



TEST RESULTS 



Nomi nated 



Not Nominated 



Total 



Cell Contents: 



Eligible 


Not Eligible 


Total 


150 


66 


216 


(83) 


(133) 




45t 


13% 




180 


461 


641 


(247) 


(394) 




55% 


87% 




330 


527 


857 



N 

(Expected N) 
Column % 



Cfil-Square « 115 
df " 1 
p .01 
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TABLE 14: COMPARISON BETWEEN TEST RESULTS AND TEACHER 

NOMINATIONS FOR COMPENSATORY EDUCATION - GRADE 1 



TEST RESULTS 



Nominated 



Not Nominated 



Total 



Eligible 


Not Eligible 


Total 


234 


70 




(161) 


(143) 




SOX 


17X 




235 


345 


580 


(308) 


(272) 




50% 


83% 




469 


415 


884 



Cell Contents: 



N 

(Expected N) 
Column % 



Chi -Square •» 105 
df « 1 

P <.oi 
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TABLE 15: COMPARISON BTTWEEN TEST RESULTS. AND TEAa<ER 

I OMINATIONS FOR COMPEKSATORY EDUCATI(»I - GRADE 2 



TEST RESULTS 



Nomlrtated 



o 



Not No.Mnated 



Total 



Cell Contents: 



Eligible 


Not Eligible 


Total 


240 


70 


310 


(153) 


(157) 




54X 


15% 




201 


383 


- - *— 

584 


(288) 


(296) 




m 


^% 




441 


453 


^4 



N 

(Expected N 
Column { 



Chi -Square « 148 
df « 1 

P <.oi 
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All three analyses yield chi -squares associated with p -values of <.01. 
Consequently, we must reject the null hypothesis and conclude that the 
differences between these two bases for selection into the compensatory 
education program are unlikely to be attributable to chance. 

If we accept the assumption that teachers can Identify children who need - 
compensatory education services, these data support three conclusions: 

1. There i:» a meaningful difference between teacher judgment about who 
should be in the program and the students who were actually selected for the 
program using the needs assessment test results. Overall, approximately 50% 
of the students selected to participate in the compensatory education 
program based on test results were not nominated for the program by their 
teachers. 

2. Teachers nominated fewer children for the compensatory education program 
than were selected by the needs assessment tests. F«r exaiRple, Table 14 
shows that while 469 students were selected Into the first grade 
compensatory education program based on their needs assessment tests, 
teachers recommended only 304 children for that program. 

3. Teachers generally concurred with the results of the needs assessment 
tests for students who were not eligible for the proaram.^ Overall, the 
teachers indicated that ^^proximately 85X of the children excluded from the 
program based on their needs assessment test scores should net be in the 
program. However, a small but meaningful number of* students who were 
nominated for the program by their teachers did not have test scores that 
qualified them for the program. 

In part, these results reflect the policy decision by the Taylor Schools to 
provide compensatory education services to a large group of students in the 
earfy elementary grades. As described earlier, that decision led to (a) the 
acceptance of significantly more students Into the program than would be 
nominated by classroom teachers, and (b) use of a cut-off scores that were 
closer to the average score. Use of the lower (i.e., less restrictive) cut- 
off score led to many students being admitted or rejected from the program 
based on a one or two point difference froai the cut-off score. This 
suggests that the findings of this study might be different If tbe district 
selected a cut-off score only slightly higher or lower than the'^one actually 
used. The researchers ore presently conducting a series of studies on the 
impact of using different cut-off scores on the reliability and concurrent 
validity of the needs assessment Instruments. 

Summary; 

This paper describes four studies of some of the psychometric properties of 
short, locally developed needs assessment Instruments. These tests were 
designed to help the Taylor Public Schools select students for its 
compensatory education programs. The studies, which focused on the 
stability of student scores, classification stability, content validity and 
concurrent validity, support the following conclusions: 

1. The test scores are only modestly stable over a period of two to three 
weeks. <• 
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2. The tests did not reliably classify students into the "eligible" for 
compensatory education and "not eligible" groups. Approximately 43% of the 
students classified as "eligible" on the first administration of the test 
were "not eligible" when re- tested on the same measure two to three weeks 
later. 

3. Classroom teachers believe the tests do a good job of measuring skills 
they consider important for incoraing students. The teachers rated all the 
skills tested by these items as "somewhat Important" or "Important", and 91% 
of the items were considered "good" or "excellent" measures of those 
concepts. 

4. There Is only a modest relationship between a teacher's belief that a 
student should participate in the compensatory education program and the 
test score obtained by that student. The correlations between teacher . 
nomination and test scores Is low and approxinately 50% of the students 
selected for the program based on the needs assessaient test scores were not 
nominated for the program by their teachers. The researchers describe how 
using Inappropriate cut-off scores night affect the findings of these 
studies. 



Conclusions: 

This work raises several issues that merit further examination. First, the 
researchers found significant Inconsistencies between teacher Jud^nts 
about student. need for compensatory education and the results of the 
district's needs assessment instruments. If we assume that teacher judgment 
Is a reliable and valid Indicator of student need, then these findings 
question efficacy of the locally developed needs assessment Instruinents. 
But If teacher Judgment Is unstable or Invalid, «e cannot conclude that the 
needs assessment Instruments are faulty. Certainly the Issue of teacher 
nomination for compensatory education needs further study. If teachers 
generate appropriate lists of children to receive compensatory education 
services, perhaps districts should place greater einphasls on teacher 
* judgment Instead of emphasizing the use of tests to identify participants. 

Second Is the Issue of the Impact of using particular cut-off scores on the 
reliability of selection instruments. The researchers are reanalyzing the 
Taylor data to determine the Impact of using different cut-off scores on the 
' classification stability of the Instruments. They expect that using cut-off 
scores further from the nwdian will result In Improved classification 
stability. 

Third is the disparity between teacher Judgment about the qualify of the 
items on the tests and the generally unreliable results yielded by the 
measures using the present cut-off scores. Lack of test reliability sets a 
cap on the validity of scores generated by a measure. The "fact that 
teachers believe the tests do a good job of measuring Important skills 
should be viewed with caution if. In fact, a test yields generally unstable 
results. These findings suggest that districts examine the reliability bf 
their selection test scores, even If the staff believes the measures are 
doing a good Job of measuring ii^or.tant skills. However, It should be 
recognized that the Judgment of these teachers might be correct; the tests 
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might yield valid results if more appropriate cut-off scores are ^ 
established. ^ * 



Finally, there is the issue of how to best select children for compensatory 
education programs. As suggested earlier, using locally developed r.eeds 
assessment Instruments is one of several alternative procedures. The 
effectiveness of these 1* struments should.be judged' In comparison to the 
alternatives. For example, norm referenced achievement tests and teacher 
rating instruments are widely used for student selection Into competisatory 
education- programs. Additional rark should be done to study this 
classification stability of these measures when t.iey are used as a basis for 
student selection. ' . 
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